4 research outputs found

    FDive: Learning Relevance Models using Pattern-based Similarity Measures

    Full text link
    The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.Comment: 12 pages, 7 figures, 2 tables, LaTeX; corrected typo; added DO

    Communication Analysis through Visual Analytics: Current Practices, Challenges, and New Frontiers

    Full text link
    The automated analysis of digital human communication data often focuses on specific aspects such as content or network structure in isolation. This can provide limited perspectives while making cross-methodological analyses, occurring in domains like investigative journalism, difficult. Communication research in psychology and the digital humanities instead stresses the importance of a holistic approach to overcome these limiting factors. In this work, we conduct an extensive survey on the properties of over forty semi-automated communication analysis systems and investigate how they cover concepts described in theoretical communication research. From these investigations, we derive a design space and contribute a conceptual framework based on communication research, technical considerations, and the surveyed approaches. The framework describes the systems' properties, capabilities, and composition through a wide range of criteria organized in the dimensions (1) Data, (2) Processing and Models, (3) Visual Interface, and (4) Knowledge Generation. These criteria enable a formalization of digital communication analysis through visual analytics, which, we argue, is uniquely suited for this task by tackling automation complexity while leveraging domain knowledge. With our framework, we identify shortcomings and research challenges, such as group communication dynamics, trust and privacy considerations, and holistic approaches. Simultaneously, our framework supports the evaluation of systems and promotes the mutual exchange between researchers through a structured common language, laying the foundations for future research on communication analysis.Comment: 11 pages, 2 tables, 1 figur

    VulnEx : Exploring Open-Source Software Vulnerabilities in Large Development Organizations to Understand Risk Exposure

    No full text
    The prevalent usage of open-source software (OSS) has led to an increased interest in resolving potential third-party security risks by fixing common vulnerabilities and exposures (CVEs). However, even with automated code analysis tools in place, security analysts often lack the means to obtain an overview of vulnerable OSS reuse in large software organizations. In this design study, we propose VulnEx (Vulnerability Explorer), a tool to audit entire software development organizations. We introduce three complementary table-based representations to identify and assess vulnerability exposures due to OSS, which we designed in collaboration with security analysts. The presented tool allows examining problematic projects and applications (repositories), third-party libraries, and vulnerabilities across a software organization. We show the applicability of our tool through a use case and preliminary expert feedback.publishe

    ParSetgnostics : Quality Metrics for Parallel Sets

    Get PDF
    While there are many visualization techniques for exploring numeric data, only a few work with categorical data. One prominent example is Parallel Sets, showing data frequencies instead of data points - analogous to parallel coordinates for numerical data. As nominal data does not have an intrinsic order, the design of Parallel Sets is sensitive to visual clutter due to overlaps, crossings, and subdivision of ribbons hindering readability and pattern detection. In this paper, we propose a set of quality metrics, called ParSetgnostics (Parallel Sets diagnostics), which aim to improve Parallel Sets by reducing clutter. These quality metrics quantify important properties of Parallel Sets such as overlap, orthogonality, ribbon width variance, and mutual information to optimize the category and dimension ordering. By conducting a systematic correlation analysis between the individual metrics, we ensure their distinctiveness. Further, we evaluate the clutter reduction effect of ParSetgnostics by reconstructing six datasets from previous publications using Parallel Sets measuring and comparing their respective properties. Our results show that ParSetgostics facilitates multi-dimensional analysis of categorical data by automatically providing optimized Parallel Set designs with a clutter reduction of up to 81% compared to the originally proposed Parallel Sets visualizations.publishe
    corecore